Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Over-sampling algorithm for imbalanced datasets
CUI Xin, XU Hua, SU Chen
Journal of Computer Applications    2020, 40 (6): 1662-1667.   DOI: 10.11772/j.issn.1001-9081.2019101817
Abstract347)      PDF (749KB)(378)       Save

In Synthetic Minority Over-sampling TEchnique (SMOTE), noise samples may participate in the synthesis of new samples, so it is difficult to guarantee the rationality of the new samples. Aiming at this problem, combining clustering algorithm, an improved algorithm called Clustered Synthetic Minority Over-sampling TEchnique (CSMOTE) was proposed. In the algorithm, the idea of the linear interpolation between the nearest neighbors was abandoned, and the linear interpolation between the cluster centers of minority classes and the samples of corresponding clusters was used to synthesize new samples. And the samples involved in the synthesis were screened to reduce the possibility of noise samples participating in the synthesis. On six actual datasets, CSMOTE algorithm was compared with four SMOTE’s improved algorithms and two under-sampling algorithms for many times, and CSMOTE algorithm obtained the highest AUC values on all datasets. Experimental results show that CSMOTE algorithm has higher classification performance and can effectively solve the problem of unbalanced sample distribution in the datasets.

Reference | Related Articles | Metrics